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ON THE SYSTEM OF CURVES FOR WHICH THE: 
METHOD OF MOMENTS IS THE BEST 
METHOD OF FITTING 


by 


A. lL. O'Tooire’ 


National Research Fellow. 


In Mr. R.A. Fisher's paper! on the mathematical foundations 
of theoretical statistics the following statement is found: “The 
method of moments applied in fitting Pearsonian curves has an 
efficiency exceeding 80 per cent. only in the restricted region for 
which & lies between the limits of 2.65 and 3.42 and for which 
, does not exceed 0.1. It was, of course, to be expected that 
the first two moments would have 100 per cent. efficiency for the 
normal curve, for they happen to be the optimum statistics for 
fitting the normal curve. That the moment coefficients & and Ae 
also tend to 100 per cent. efficiency in this region suggests that 
in the immediate neighborhood of the normal curve the departures 
from normality specified by the Vearsonian formulas agree with 
those of that system of curves for which the method of moments 
gives the solution of the method of maximum likelihood. 

The system of curves for which the method of moments is the 
best method of fitting may easily be deduced, for if the frequency 


a COR I RT 


in the rage @# he y/x, 8,8, Gaz then Slay must involve 


% only as polynomials up to the fourth degree ; consequently 


nr SOLES TF 


wi 3 2 
: y-o a*(z #0, % +e hig + hag) 


' Philusophical Transactions of the Royal Suciety of Loudon, vol. 222 
series A (1921), p, 355. 








2 CURVES FOR METHOD OF MOMENTS 


the convergence of the probability integral requiring that the 
coefficient ot x* should be negative, and the five quantities 
a, Py» Pe. P;, P, being connected by a single relation, representing 
the tact that the total probability is unity.” It is with these 
curves having a fourth degree polynomial in the exponent. that 
the present paper is concerned. 

The first step in the study of this system of frequency fune- 


tions is to find an expression for the value of the integral 


co 2ry4 3 2 
1-f .° (1 "+p, 4+ p,% +2,% +P) 
= 00 


In other words, it is necessary to know how the integral depends 
ou the parameters @, 27, 2, P3, P4- 

Since @ depends only on the unit of measure of % it will be 
sufficient for the moment to consider @72 7 . Furthermore a 
linear transformation on 2x leaves the value of the integral un- 
changed. If we replace x by 2-2/4 the integral to be considered 


becomes 


l=/e ax=k 
-00 


aE 0 a px eget) i/ “etepx*4ge) 
e 
oo 


Consider now then the frequency curves ys Fe 
These curves are typically bimodal and may be classified accord- 
ing to the number and kind of modes. The positions of the modes 
are given by the solutions of the equation Yeo . that is by the 
roots of the equation Ax 2prtgeO The discriminant of this 
cubic equation tells us that there will be three distinct real roots 
and thus two distinct maxima with a minimum between them for 
the curve, that is two distinct modes for the quartic exponential 
curve, if -8p7>27g? <Q. Two roots will be real and equal if 
-8p°:27g% <0. ‘Vhe three roots will be real and equal if 


dx where fee’. 


-(2% pxe+gx) 
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r=g-Q. the three roots being 2-2. In the case of three real 
distinct roots, if two of the reots are equal in magnitude but 
upposite in sign then g=Q and the curve is symmetrical with 
respect to the y-axis. Ii =O g#O there will he one real root 
and two imaginary roots given by the three cube roots of Z 
That there will be a real maximum at the value of x given by 
the real cube root of 4 is easily seen from the nature of the 
curve or by considering points at values of g% on each side of 
this real cube root of ¥ . 

Henee the following classes of curves and their respective 
equations will be considered. 


4 
Typel: yste . 


The curve which is symmetrical with respect to the y = axis 
and has only one mode, this mode being at #29, 


-(2*. 262%) 


Type Il: ys ze b>0. 


The curve which is symmetrical with reapect to the yoaxis 
and has two distinct modes at 2 =«t J. 


~Ce F-404 
Vype lll: ys se Ge ” } c# 2. 


“a8 e . g 
lhe asymmetrical curve with one real mode at x2 VC, 
™ ' - (0 4+ 0x * ge). 
Mype IV: yee 
The general type of curve with the quartic exponent. 
‘Type I: 
First evaluate the definite integral 


oo ao 4 
Z -/e ox -2/ e * ae. 
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-~o 
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i ‘ 
Let %=y v , AL= (1/4) y FF sine ‘Then Then 
00 oo 
‘ U4 y “Ye Yady (Yell y pets ay=Z/ (4). 
” o 


Similarly it may be shown that 





we -2@ Z yr 
E e ae= 3/0 ZY p>-l. 


oO 


or > 
- 4 . : oe 
i - [xe r= O since the integrand is an odd iunction, 


-o 


ao 
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ia 1 Z 
i= ere ax = Fl (g/. 
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I,-/ 2#49@" gt29 
-0 


- , unt 
Lye [| 226 °* az = £SUG)2£7CP). 
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Hence if the total frequency is unity then 4 = a4 , 
Zz) 


M4 Ly/ la =O, 


‘U#) [ZR 
£2 * 4,/L, o r( * AGF. ) *O. 3379891 approximately, 


Ay2l,/l, = L/4, 
KF 27-1 *Lon7 /l, «o, 


[COPY 


A27n =Lyy CL) 
4 


Type II: 


Consider the definite integral 


ao 4 2 
if a. dn 
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-(a% Lbx*) 


Integrate by parts letting @= e and @Y= a. Then 


+ fy a 
L:/ Aitss*. £bx?) ‘ (27 LOX ity 


-@ 


ad 4 oe At 2) 
= 4/ ste G “bz _ 46 / z*e o oe) 
~@ -@ 


Now obviously Z, cannot be zero. Hence dividing by Z we 
find 7 = 4, -4hit, and therefore 


b 4ilg- 1 4a -- #9 


( 4, cannot be zero). 
4A, A 2 . ) 


Now that 4 is known (calculated from the given data by this 
last formula) it is possible in any particular problem to find by 
mechanical quadrature the value of the integral Z to any desired 
degree of approximation. The simple rectangle formula with even 
a small number of ordinates known will give a good approxima- 
tion. 

Return now to the integration by parts just performed. The 
result takes the form 


gl, al, 
40° Spe ~* ae 


a2, al, 
ex: maa oo a 
abet - m4"? 


which is a Riccati? differential equation. Riccati’s equation is 


2 Johnson’s Differential Equations, p. 227. 
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4 +éaz”™” ‘ZY, a(x we It has a solution capable of 


expression in finite form in terms of elementary functions if 77 
is the reciprocal of an odd positive integer. In our equation 77724, 
a=-{ hence no finite solution for the differential equation is 
possible, That is no finite expression in terms of elementary func- 
tions can be obtained for 2 . The solution of the Ricatti equation 
here is 





oe. 9:5:16° 13:95: 16° 
2 C148 - 2510, PP FIe 
(1) 
go* 736" 11:°7-36° £8:14:7:3b? 
Glide FF + Se a ¢ Seana Go'ev of 


To determine CG and G we note that when 4-9then 


-if rt 
o° 2 TUZ)=Cy 


and that when «0 then J | oT “Se 1G F)-G 


It is worth while to make certain transformations on the 
differential equation 


a*7 wl, 

woe “655° -1,°9. 
Let Z =e % Ve, . Then 
d*y 2 

abe 2*v2zP9 


Let S72e¢. Then 
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dy 1a 
Same oe Ee - ve CO. 
at? 2¢ at og 


Let vet “yy . Then 
oe 
p2 LG 01 BL etcra jr, «0 


let Z=¢x% where ¢=/-7. Then 


a* zy, xt, +[22-(1/4)?|we 


‘This last equation is Bessel’s differential equation* with 77= 2/4. 
Hence its solution is 


_sf ye 
We AL 4)+B 1 CJzALy C <P BS; C LP where 
4 4 + 4 


a 


ze 
Jj Ge) = aml * (mb nvd) a2 4a)” -+) 


= 60a)” 
p20 (Vl rtler)r/ 


The above transformations give 





L«Cb672) ro ot/e m 
Hence x 
2% bLY2 “i 
0a) ean HF P10, (EN, 


8 Johnson’s Differential Equations, p. 235. 
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Setting 4:0 in J we find B= i C Dre 4 :) 


Vee 


aa al, 2Vlg)C#) 
Setting 4-9 ina we find A = ee ° 


Putting in these values for 4 and we find finally 


LY2 f”, _— ssa @% 
l Gia 8s 79.44:2/" } 


HUGIOE, + ges; ++}: 


It is worth noting, for purposes of computation. that the ex- 
pression (2) converges much more rapidly than the form (1) 
given above,* on account of the factoring out of @ @ In addi- 
tion the series in (2) have the advantage that the powers increase - 
by 4 instead of by 2 as in (1). It will be shown presently that 
ordinarily 4 is less than unity. But even for 4=/7 it will not be 
necessary to go further than the terms involving 4 ™ to get at 
least seven decimal places of accuracy. For 4 less than one even 
fewer terms will suffice for this degree of accuracy. 


*The form (1) is obtained immediately if we write 
ae 4 a 0 Hat 2 © 4 2 
i:-fe a ae tfe Ce *2bxYy gfe ate 
* oe Q 


sé 
-2fe te bee ton". + z,.. ++ ele, 


assume term by term integration permissible, and make use of the fact 


il 
already mentioned that J a” o* ie = Zr (> 2!) 


o 











i) CURVES FOR METHOD OF MOMENTS 


From the point of view of the Ricatti differential equation it 


@ 4 2 
~(atébsz*) : . 
can be shown that Z -/e G @z is the solution of 


- 00 


al, < ais ' 
whe -2b == - he =O when the solution is sought in 


the form of a definite integral.* lor the differential equation 
So (OD) vi fh (Qv-Owhere 0-4 and g and ¥% are polynomials 


in 8 with constant coefficients is satisfied by 
B b2.fa(t)T(t)dt 
x 


where ¢ is a constant, TH) is the reciprocal of git), and @& and 
& are so chosen that for all values of 3 


we 
E btr fst) ian - 
x 
Let f(D + £(OD)v = D*v- 26Dv-v. ‘Then 
Bt) =-2t, (t)=t*1, 
< 
T(t)<-3-, JAC) Tt) dt--4(F- logt) 


we fo 91/9 
and |e”. * = Ofor all values of 8 ifa-O Grom, 
4 


Hence 


Vr-L(ftogt 
vec [eet a ial (Jat 


Qo 


° A. R. Forsyth’s Differential Equations, 6th edition, 1929, pp. 277-2890. 
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--£ fg Feat, 
é vt 


eg 


Now let Z7=22* and c=- ¥2. Then 


wo /rt. 2 
yea f COON, 
g 


* a 2 
-/ o @t2bxA) 


-d 
«Ip. 
An idea of the variation of as a function of & can be ob- 
tained from the following table calculated from (1) for values 


of & at intervals of 0.1 from 0 to 1 and using a, Up) = 3. 625610, 


SUZ): 12454) 7. The results are plotted in the accompanying 
graph, Fig. 1. 







1.812 805 


1.945 063 
0.2 2.099 726 
0.3 2.282 225 
0.4 2.499 648 
0.5 2.761 349 
0.6 3.079 783 
0.7 3.471 748 
0.8 3.960 152 
0.9 4.576 578 

5.365 158 


The modes are at x= * YH . Ordinarily the ordinate at the 
modes will not be greater than e= 2 7/GZ times the ordinate at 
z=Q . Hence ordinarily it will not be necessary to consider 
values of 4 greater than unity. 
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Kp o 
FIG. I 
oo ii 
L " [ze at -2bx7) a 
-@ 
wi -(0 *2bx7) 1a, 
4, Js “— e-F oF 
wo -~¢4%2oz7) 
4, are Wt = 


-@ 


w@, ~et2bz) 1° dl, 
Ly =f ate WL = ) wee . 
= 00 


w -(x 4 Zbh2*) 7 af”, 
- fate ae <($)” 2 he, n-OL24 
- 00 


+, 


r4+t 


ao 4 a 
eh gg Pail yx OL2F > 
Cw i 
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To find these derivatives one might use the relations® 





‘s = 4, Ty l2)~ Iyyy@) 2S, 1 (2)- 2 J, Ce). 


But term by term differentiation is permissible and for this pur- 
pose it is simpler to use (1) rather than (2). We find 


1 b* 56* 2 ae. 50°, 7505 U73b" 
Leyey a” BI |) FB oe 


al. _— $68 956, ergs a, 28t aed 








567 956 3 256% U75b*,..). 
Sh tlle BE rete ge 


Since 420 hence /, and all its derivatives are greater than zero. 
Now the totai probability is to be unity hence take 4 = ‘ 





Z, 
-4. 
A; / 2g, 
al, 
M,= 42 « ZB ‘ 
4, ZZ, 
Za 
Mhge J =O, 
" 
2 
x 2h 
Mgt Fz aie 2 
4, 41, 
etc. 
Type III: 
A 
pe (a* ca) 


® Whittaker and Watson, Modern Analysis, third edition, p. 360. 
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This curve is not symmetrical. But, obviously, changing c 
to ~c has the same effect as changing x to -% or simply revers- 
ing the shape of the curve and the distribution from which it 
arises. Hence it will be necessary to consider only positive values 
of c . As stated already. it is easy to show that there is a real 
mode at the point given by x equal to the real cube root of c. 
If c=/ then y-4e%, that ise% times the value of the ordinate 
at x=2. Hence usually c will not be as great as unity. 


7 
(2) 
(4) 
x 
Fic, If 
4 
W) y«he** (2) yoke POF) 
Let Z -/e -* Bet yy 
4 404 


Integrate by parts letting w =e ay" Wv=e LH. 
Then 


ao 
-44 
Z,-2/x% 0" toed, 
o ¢ 
~00 





| 
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ee ROE RT 
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Hence 
= -(4 7-402) 
/ A Fo Lz 
C= 2 
don 
/ 
= Ad 3. 


With this value of ¢ calculated from the given data mechan- 
ical quadrature can be used to find an approximate value for Z . 
Vhen let 


The result of the iitegration by parts could have been written 


mi the form of the dithcerential equation 


atl, 
ae? 





64cL, =O) 


wo, 4 
Rais -~(47-4C0%H) ’ 
Conversely, it is easy to show that Z -/e Ve is the 


- 00 


defiuate mtegral form of the solution of the differential equation 


ayy 
dc? 





- 64cv=0. 


lor, here 


4 
Wd):64, £(D:D? Tt)=- 2. [kl Titate-fp 


a 
By 
and | @ "2 (no for all values of ¢ if a-- a, B= a, Hence 


x, 
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oo 4 
-.4 [ct-t 7256 
V= 4 /: a. 
ag 
Now let (=%x and c=-£6 . Then 


0 at 
v=/ e - one, 


“fo 


4. 


An expression for the value of 7, can be obtained either by 
finding the series solution of the differential equation and deter- 


mining the constants by setting c=O0 in /, and its derivatives, or 


by expanding e*°*~ in series in the definite integral itself and 


then integrating term by term. 


at 
Zz he -4cx) ,, 


e 
~@ 


° a 4 a@ 
“Je a +e Ce" det) yy 


—-ca o 


© et 4cx) oe +4 
-/e ( a+ /e (x tcxl 
e 


o 


? x? ~4202% Acxvt —_* 
-/e (e *e Jetx2é/ cosh (Acx) Lx 
Qo (a) 











a 6 
i (Aca)* (4cx)  (4cx) 
-/e Ee i as porn 


ROOT TE A 
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° (Be)? (27#+L) 


of 

"& ee (2 77)! a 

_ faith (Ac)* 5(4e)° 9:5(ac) 
pe 44)" 428)" a%pg; " 


2\(4 (Ac}* H4c)° ¥ 734)” 


These series may be differentiated term by term to obtain the 


derivatives of i. and hence 


~@ 
- 4 27 
s, atece) 2A el, 
he / « P wel! Be 
-@ 


-(x4 tex}, 1,9 aC 
eo" 
L= xe ax =(F/) > 


ro sous 4. 
-@ 


etc. 
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If x is replaced by x-We the effect is to translate the rodal 
value of x to the origin. ‘The equation of the curve then beccmes 


wall as 6,806, 24 +0, 2666) 
ty 
where 
¢,2-#Fe, 
C- Oe 
c= -Be, 
3 
qe Deve . 
Type IV: 
iii al 
eee 


Consider the definite integral 


Z -/e C8 pate ge) 


oO 
- 00 


If p-g-Owe get Type 1. If 040 geQwe get Type Il. li 20, 
g#O we get Type III. Hence consider now p42, g#¥ 2. 


~ 4 - 
Integrate 7 by parts with v=e - “Ped dvee “x, Then 
7 rc ~(4 4 px*r ga) 
1,--g / (4x %épuje Qe 


~2 


8p bp LY Ge) ? Ce hp +ge) 
-2/ 2% oie "ae 2 / xe a Lz, 
-@ 


aa RNS RT 


OO LRP TET PPT ITT RAITT I 


TREE ONT TT HT TIT TS 


nee 


LI ST IT wea mT 


ree 





A. L. O'TOOLE 19 
Now divide by /, and multiply by g. Then 
G=- Fu; - Zou), 
Begin again with Z 


, and integrate by parts, this time with 


4 2 
-(x% Pi 
“=e re ow. Avex . Then 


-(2 % pxtegx) 


ag 
ZL, [A + épx tga/e ax 
"Ge 


o | 6OC A 2 — 2 ) 
= 4f x" wae "Waer2p/« “e ee a 
200 "O02 


a9 
4+ G/«e 
~a0 


etal 
Divide by Z . Then 
j= tus +€ Py + Gl, 


4 : 4 ‘ 4 
Now substitute 92-4, 2a, in L=-4e Ly taffy + GhLy and we get 


. Lt4 lj ty - 4M, 


p 
AM - Hj") 


(3) 


L+4 4) * en! 
9- -4s- Kou )=-4i; ~hhy SAPs Xs} 


Mhyl- fag 
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The result of the two integrations by parts can be written in 


the form of two simultaneous partial differential equations. They 
are 


IZ, al, 
ae «#e ot eal «oO 
0*7 ol, OL 
4 —- --J = 
dp# a dp 7% O7g n° @ 
of a a 
~(# “+ pe") 
Let JS, ofa Lx, . Then 
- @ 
t,-/ ae cu)” FB 
“@ 


eo 


4 a 
iste + OX +gt) 
- 00 


— ee 
“le + P% - “OT te 
~ 00 


ao 


ao OR pF aE Saale - Cee. | 








3/ 
-0o 


aL TS TT TT 
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a 
A, L. O'TOOLE A j 
oo ad i 
«£ J, \ 
fo (eit “* 
: 
is oan pe 4 ge), 
aa” 
-@o 
no” 
afta 2 
| - 
oo ae | 
-L Eo S42 
ton Cee )/ + KE 
5 Cet pertege) 
Leonel = xer7tl o (2% + Pe Ge ae 
= 00 
| 
=(-1) 9127 
oF 
oo g ttf 
LL) (arth Sensei 


When the values of £ and g are calculated from the data 
of any given problem by the formulas (3) then values for ie ; 
L,, £2, l,, Z, . etc. can be obtained by mechanical quadrature. 

For two real, distinct modes -6p72737 (<2). Hence if 
-<csp<Othen £94 > gL 54. li-Bp"=27* then one mode Hattens 
forming a point of inflexion with a horizontal tangent at the min- 
imum point. Changing ¢g to -g has the same effect as changing 


x to -x and hence g is a component of skewness of the curve. 


li the curve is so placed that the sum of the values of x at the 
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modes and at the minimum point is zero then the equation of the 


curve will be of the form 
y=he ~ta*apatoges | 


If now we change the scale of x by replacing x by xV¥a then 
we are led to the functions of the form 


you - 2x *+ px*+ ge) 


l’erforming the two integrations by parts, as before, on the 


ale az*(e% ox%s G7) 
“00 


leads to the relations 


iutegral 


g =-tus-Lpu, =-4(u,+5/1u; + I2M p, 


4) 
a / / / ' 
- Gxt tA, Mi, - A; , 4, - 4g + FIM; 4 14a, ) 

"8a a) uz | 


If 
a 


fae pute «) 
Se af ae =L, (ag) 7:G123°-- 


- a 
then 


@ 
J xe -at(ac% px gr 


J 4 ‘ 
™* ghz 1,,@p, 279) 1, lap, a9). 


In particular, 


- 2 as 
Z. -L leo): J @ ote ate) CH) 
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‘=? 1 an, -Atet ff (nel) 
22 “27 a Ym 29 Cla a 7 


faye Ty L,=2 





ered 
i £3 
Aan *4Len lle © LO-F) 
In the case of Type I when 
-atz@yt 
Y= Ye Keep UG 


+g and hence from (4), or as can be shown directly, a*« < 
A 


in Type I], 


a 4 2) ow s 
atl *+ px a tal 2) 
y-ye , -/e a(t DX yy 
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g=2 and hence i siilnssasua . In the Type III where 
& 2+ 4M 
_ ate * t/t 
y= He oe "ee 1. /e —_ 9 ye =O and hence 
4o Leo 
a? 


" t(4tg+ SLO ul, + HT Gy ) 


In general, since and g are determined when the modes and 
minimum point of the curve are known, theoretically at least, 2 
is fixed by the relations (4). In practice, however, this would 
mean that the accuracy in the determination of a? would be con- 
tingent upon the accuracy with which the modes and minimum 
point are determined. Hence other methods for fixing @? will be 
required in general. Now if in g (22, a*Z)we replace and g 
by (4) which involve only a* and quantities calculable from the 
given data we have a function of a alone, say 7a). It will be 
sufficient then if we determine a value of @ such that /@/)= N 
where /Y is the total given frequency. Then fix and g by (4) 
and the modes and minimum point by #z 9+ 20% 4*G=PD. 


The points of inflexion are found from the equation 


ZY Lo 

axe 
and for Type I are given by x te jot . Hence 

ye £LISOOOS 
Jae 

approximately.. For Type II they are given by da*x"4Ge*ox *+ 
dai? x70 =O. For Type III they are given by LO.a%< Ga'gx? 
léex*+4°g* =O. And in general they are given by roots of the 
equation 


16a*x 6, 1642" Ga Gxt 4a40? Wx: 4.44094 428%G*2p <0, 
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It will be noticed that the distribution given by 


pean” @*(2% px*sge) 


can have the Mean at the origin if and only if ¢=O . that is. 
if and only if the distribution is symmetrical. Now replace x by 


x-™, The area remains the same and hence also Y, - 
The equation then is 


pue -A4(2%1 0,454,474 2,% +4 ) where 
@ 


P,*- 4m, 
P2= 6 m*+p, 
P3* 9-<mp-4tm* 
P= 74 mo - 777F , 
and f and ¢ are given by the relations (4) above. An integra- 


: , -0% 42,4741 0,x710, % 4 
tion by parts with wee orgs — that 


a4 + Sp, pl! +4 Dp fl + pp, J=1. 
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This small beginning of the study of the system of frequency 
curves with the quartic exponent will be concluded here with the 
construction of artificial illustrations of ‘l'ypes 1 and JI. 


TYPE I 


. af -#* 
" 77zy © 
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y , 109, (Z )20. 5593811, 
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, 5515762 0055158 0000551 
0.2 5507494 0220300 .000881 2 
0.3 5471811 0492463 0044322 
0.4 5376888 0860302 0137648 
0.5 5182096 1295524 .0323881 
0.6 4845787 .1744483 .0628014 
0.7 4338852 .2126037 1041758 
0.8 3662367 2343915 .1500105 
0.9 .2862255 2318426 .1877925 
1.0 .2029338 .2029338 .2029338 
1. .1275846 1543774 1867966 
1.2 0693579 .0998754 1438205 
1.3 0317147 0535978 0905803 
1.4 0118376 0232017 0454753 
1.5 0034917 .0078563 .0176767 
1.6 .0007861 0020124 0051518 
1.7 0001301 .0003760 .0010866 
18 .0000152 .0000492 .0001596 
1.9 .0000012 .0000043 .0000156 
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.0000001 .0000004 
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f e -(x*0.5x%) 


Y" 5787099 
TYPE II 
Slee aac eines xy 

0.0 0.4572267 0.0000000 
0.1 .4594725 ‘aan .0000459 
0.2 4657175 .0186287 0007451 

0.3 4744135 .0426972 0038427 
0.4 .4827888 .0772462 ‘ 0123594 
0.5 4867153 .1216788 .0304197 
0.6 .4808614 1731101 0623196 
0.7 .4594725 2251415 .1103193 
0.8 .4180410 .2675462 .1712296 
0.9 3556970 .2881146 2333728 
1.0 2773220) .277322U 2773220 
1.1 .1936552 .2343228 2835306 
£2 .1181056 1700721 2449038 
1.3 .0611958 .1034209 1747813 
1.4 .0261429 0512401 .1004306 
1.5 .0089145 .0200576 0451297 
1.6 .0023433 0059988 0153571 

1.7 .0004575 0013222 .0038211 

1.8 .0000638 .0002067 .0006697 
1.9 .0000061 .0000220 0000795 





2h 0000004 0000016 0000004 


5.2286133 2.0827448 1.7706859 


MORLEOLI3)- O45 72267 


Total frequency = 70 = 1 000000, 
AAG } - O OO 
jt, MAGEETAE)-OD282020, 9 aps 499¢ 
He AAT aa OB0I00 = O395413718. 


From relations (1) or (2) it is found that when Bb = O23, 
lie. 9=-2I, g=O) 


-O25 
then J = 2 187099. Conversely, the formula 3 = i428 


“2 
gives, retaining six decimal places, 2= O 2IOOQOO. 
(To be Continued in May Issue) 








ON THE LOGARITHMIC FREQUENCY DISTRIBU. 
TION AND THE SEMI-LOGARITHMIC 
CORRELATION SURFACE* 


By 
Par-Tsr Yuan 


INTRODUCTION* * 

The method of treating frequency curves as developed chiefly 
by Edgeworth, Kapteyn. Van Uven and Wicksell- occupies an 
important place in both theoretical and applied statistics. The 
essence of this method may he briefly sunmnarized as follows: 

Suppose a function of the variable zg is distributed according 
to the normal law of error. ‘Then. 2 certainly cannot be also 
normally distributed, unless the function is a linear function of z. 
Without losing generality. we shall write the normally distributed 
function in standard units as % = fGr). Thus the origin of x is 
its mean and the unit of x is its standard deviation. The relative 
frequency of values of x between x and x + cx is, therefore 


a 
“ ax 


1 — 
4 
We 71 
and the relative frequency of values of 2 between g and 
2+ ax is 


« 2 
Fe fale SN, 


Thus if we have an observed frequency distribution of 2 
and we know a normally distributed function of zw, then we can 


* A Dissertation Submitted in Partial Fulfillment of the Requirements 

for the Degree of Doctor of Philosophy in the University of Michigan. 
** Papers writtey by the writers mentioned in this introduction are 
listed under the writers’ names in the Bibliography at the end of this paper. 
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graduate the distribution of g ly using this formula. Edgeworth 
calls this method of graduating a frequency distribution the 
method of translation. In two papers on “Skew Frequency Curves 
in Biology and Statistics” published in 1903 and 1916, J. C. 
Kapteyn elegantly set forth a theoretical foundation of this 
method. Later Wicksell gave a similar justification. Both of 
them based their “genetic theory of frequency”, to use Wicksell’s 
terminology, upon a generalized hypothesis of elementary errors. 

In the present paper. we are interested only in the important 


special case where xed log = The frequency function of 
z, then, becomes: 
i Z-a2)2 
4 a2 Cleg “F) 


Von c (@-a) 


which is called the logarithmic frequency function.* 

Numerous papers have been written on this frequency curve. 
Among the early writers were Francis Galton and McAllister. 
But a systematic treatment on the properties of this curve from 
the standpoint of mathematical statistics is still lacking. Hence. 
in the first part of this paper, such a treatment will be given, thus 
leading to some interesting relationships among the characteristics 
of this curve. 

Various methods of determining the parameters of this fre- 
quency function have been proposed by writers on this subject. 
Pearson is the first writer to make use of the method of moments. 
Later this method was also applied by Jérgensen and Wicksell. 
In this paper, the method of moments will be considered and a 
table will be provided to facilitate the computation of the constants 
by this method. 

Edgeworth, Wicksell and Van Uven all have contributed in 





*For a justification of this frequency function based on Weber- 
Fechner’s Psychophysical Law see the “Calculas of Observations” by E. T. 
Whittaker and G. Robinson, pp. 217-218. (Blackie & Son I.td., London 
and Glasgow, 1929) 
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extending the method of translation to correlation surfaces. Wick- 
sell’s logarithmic correlation surface is particularly noteworthy. 
In the last part of this paper, a semi-logarithmic correlation sur- 
face of two variables will be developed and its properties studied. 

The writer wishes to express his appreciation for the assist- 


ance [Professor Cecil C. Craig has given him in making this study. 


PART I 
THE LOGARITHMIC FREQUENCY DISTRIBUTION 
For the sake of clarity, it is desirable to state at the outset 


that the logarithmic frequency distribution represented by 


1 > a 
— -$3 (tog £22 
—_ err = ~ 





is unimodal and has three parameters. ‘he parameter @ is the 
finite lower or upper limit of g according to whether 4 is posi- 
tive or negative. In the following discussions, unless the sign 
of } plays an important role, we shall take 4 to be positive 
and @ to be the finite lower limit of 2 . However, the results 
of our discussions can be easily modified to cover the case where 
& is negative and @ is the finite upper limit of z . 

In the first eight sections of Part I the properties of the 
logarithmic frequency distribution will be treated from the stand- 
point of mathematical statistics,* and in section 9 the numerical 
application of this distribution will be discussed. 


1. AVERAGES 


We shall first give the analytic expressions of four different 
averages of & and then observe their relative magnitudes. 


* Some topics under consideration here in regard to the properties of 
the logarithmic frequency distribution have also been discussed by many 
writers, among -whom- we may particularly mention McAllister, Kapteyn, 
Pearson and Pretorius. See the references under these writers’ names in 
the Bibliography of this paper. 
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By definition, the arithmetic mean of Z is 
2 
co 
ima | ze) ce = be Pus , 
@ 


The logarithm of the geometric mean of zx about the point 
X#«a is given by 


oo 
fi log (2-2) F (z)d2 = log b. 
@ 
Hence, the geometric mean of Z about z* @ measured from 
4r-O0 1S 
ig = b+a. 
Since the median of z corresponds to x«= Zé tog 


it is equal to 


— 


log #2 +c4 
Setting the derivative at), adie F(z) 
c*#/2-a) 


equal to zero, we obtain the mode of zg as 
2 
-¢ 
m,= be +2 
Thus, the geometric mean and the median are equal. More- 
over, 
77, < My = ~) ¢ 77 


2. POINTS OF INFLECTION 
The second derivative of FYz)is 
a*Fia)_ Clag $2)" ac%e log *f-o 


axe ct/z-2)4 a 
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The roots of the equation 


Cog %2)"s 3¢* log 3 8-c4#-0 


are the points of inflection of the logarithmic frequency curve. 
We shall denote them by %, and Z,, 


-$c* li | - 


z,° 4 9c 
om A t,t. 
Z, = be 7° [/ “| 


Note that the quantity under the radical sign is always posi- 
tive and greater than one. Its square root is, therefore, greater 
than one in absolute value. Hence, Z,< b+a<#,. That is, the 
geometric mean and the median of x lie between the points of 
inflection. 

Furthermore, if we observe that the points of inflection may 
be written in relation to the mode as 


2 J 4 
-$ (103 a+ x, ) 


Z, -a=(m,-a)e 


2 
-$ (1-3/1 1+ 2, ) 


z, -a=(m,-a)e 


we see that z, Wy EQ. 


But the mean does not always lie between the two inflection 
points, since 


a 
. -$ (4+ 301+ Z, ) 


- 4 
™ $(4-3/1+F, ) 





a 


- 





EE 


~ 
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Obviously, Z, is always less than the mean. But when c*> 4/7 ; 
the mean is situated above both points of inflection. 

Now, the relation of the averages and the points of inflection, 
when ¢c“<4/7, may he expressed by the inequality 


a 


By < %y< My = 7, <77< , 


£ 


which holds for almost all practical cases, since C* rarely exceeds 


4/7 in practice. 


3. HIGH CONTACT 

A frequency function is said to have high contact, if the 
function and all its derivatives vanish at the upper and the lower 
limits of the variable z. We know that the logarithmic frequency 
function vanishes at both the finite and the infinite limits of z . 
It can he easily seen that all its derivatives also vanish at these 


points, if we make the substitution - ‘= /og e4 which will 


throw every derivative of the logarithmic frequency function into 


a product of two factors, one being a polynomial in z’ and an- 
‘ 


other being ¢ ect BME ore A is a positive integer. Thus, 
it is obvious that all the derivatives become zero, as x’ approaches 
+ @ , which correspond to the finite and the infinite limits of z. 
For instance, this substitution will put the first derivative of the 
logarithmic frequency function 77%), 


aF lz) lag ®4 +c 





=- —————_- 2 
Wz c*/2-a) } 
into the form 
gic? 42! 28' 
M277 C407 


which clearly goes to zero as z’ approaches + oo , that is, as Z 
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approaches “@” and infinity. 

The logarithmic frequency function. therefore, has high 
contact. 
4. MOMENTS 


We shall study the practical application of the method of 
moments to determine the parameters of the logarithmic frequency 
distribution in section 9. But at present we must know the rela- 
tionships between the parameters and the moments in order to 
discuss the properties of dispersion, skewness and kurtosis. 


First, we shall express the moments in terms of the param- 
eters: 


The ¢-¢4 moment of g about the point z<@_ is given by 


oo @,@ 
Mg [(28) Fladderbe 
@ 


2s-L)c? 
And we also have the recurring relation MM, = be Ms4° 


The s-¢4 moment of x about the mean is 


° ss* Ss . 1p2 
/ Lem Fla)ds=b8e t 2 pe 


a, : 
Consequently, the s-7#4 standard moment of z, « 3 * Ze 1S 


&, «(e°=1) 1s CU Jeg lle? | 


Setting s equal to 3 and 4, we have 
%,* tet t/t (e%s2) 
2 2 
%,* Se(e oi lle Is ge%*, 66°46) 





=== 


elie eee S aiiiemnemnaamntdtti nel 
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which will be discussed in connection with skewness and kurtosis. 
Note that the sign of of - follows that of & , because the sign of 
the third moment of z about the mean is determined by 4%. 

Now, we want to express the parameters in terms of the 
moments. It is clear that there is an infinite number of ways to 
accomplish this, since there is an infinitude of moments. But we 
are particularly interested in the expressions of the parameters 
in terms of the mean and the second and third moments about 
the mean. Letting w= eo we have 


5 


sential + @ 
Az = 670(w-lL) 


(2) 
Jby + bya? fo-1)*(wr+d). 


Solving these equations for the parameters, we find @ is the only 
real root of the cubic: 


w % Iu*- (Ava? )=O (3) 


Hence, the parameters ¢. & and & may be expressed as 


c=(log w/)t 
lM Efe (82 dte 


a-m(Zg)t = m-(B24)¢ 


where the sign of B follows that of ¢,, and o=, The prac- 
tical application of (3) and (4) will be discussed in section 9. 


We shall now turn our attention to some other properties of the 





logarithmic frequency distribution. 


5. DISPERSION 


The dispersion of g about the mean may be measured by 


,. fl 
the standard deviation, o= Vil, = Ze Fe 0 st Denote the 
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deviation of x from the mean in terms of the standard deviation 
by ¢*(z-m)/o . Then, with the aid of formulae (1), (2) 


and (4), we obtain the distribution of Z as 


2 
(ec? z)F oie teg|to(e 4) 4|.97 (3) 
Van cl 1+(e<t 1/4 2] 





2 
where (ects = %, /le~ * a, takes the same sign as #,. 


We know that for the normal distribution 50% of the total fre- 
quency lies between the limits ¢=~-. 6745 and f= + 6745 . 
Now, we want to know the similar limits of 7 for the logarithmic 
distribution. For that reason, we write ¢ directly in terms of 


the normally distributed function 2% «/ loa =s 


xe-¢* 
zm Ce *-1) 
eee OO 


z (6) 
Z (ect. 1/4 


Placing 2% equal to -. 67#Jand 4 6745 we have at once the limits 
2 
(e ~-6785e-F 4) 
(ec*- 7) # 
- 678S0-$* 4) 
2 
. fec*% 1) 


between which 50% of the total frequency is included. These 


limits are two quartiles and obviously depend on ¢. It is clear 
that one can also locate other deciles and percentiles of # by using 
(6). 


An abstract measure of the dispersion is the coefficient of 


variability which expresses the standard deviation in terms of the 








— 


—— 
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mean. For the logarithmic distribution, it is 


2 2 
B-| 2, |-\e**214| a 


- 


which shows that in a logarithmic distribution the larger ¢? is, 
the greater is the variability. 

It is interesting to note that if we also express the deviation 
of * from the mean in terms of (v-e/and denote it by 


Zz 
t's 7 (eo Liz 


we have by (5) the distribution of #4 in this simple form: 


2312 
1 dea[loptor J+§ | 1 (8) 
Jerre (et) © a. 


6 SKEWNESS 

lt has been proposed to use oy /@ or “%, as a measure of 
skewness of a frequency distribution. For the logarithmic curve, 
we have shown that 


a, (0 1)2 (o<*22) 
(9) 


“ «,=(w-1)4 (woe). 


Hence, the absolute value of %, increases with ¢. Since ¢ can 
take on any finite value whatever, the skewness of the logarithmic 


curve as measured by ¢, can also have any finite value. More- 


J 
over, as we have seen, y, of the logarithmic distribution can be 
positive as well as negative. 

In Figure 1 are shown four logarithmic curves with 77=#@ , 
o=/f and with varying ¢,’s. Various parameters calculated 
from formulae (4) and important characteristics of these curves 


are exhibited in Table I. 
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When ¢c=O,¢,also vanishes. In fact, the logarithmic curve 
approaches the normal curve of error, as C goes to zero. This 
can be demonstrated as follows: With the aid of formulae (4) 


we can write the normally distributed function x <= Z leg => <3 ds 


Oi 


Ag tog | 1+ 5 (e® ‘1)4| L 


a 
2-777 feo? 1)2 _ (2-7) m)* (e*-1, “te 
o S “Zg- Cc. 


" 


Nid 


Now, it can be easily seen that 


lim x = 27 
cro Dg 

which is a linear function of z. Hence, the logarithmic distribu- 

tion of z approaches the normal distribution as ¢c approaches 

zero. 


TABLE I 
Parameters and Important Characteristics of the Logarithmic Curves 


with 77=O g=f and Specified ws 








o-_— -— ee — 
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FIGURE I 
LOGARITHMIC CURVES WITH msO gel AND SPECIFIED <,'S 


O° ee 
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Another measure of skewness is defined by Pearson as 





which has a maximum value equal to. 656/, when w= 1.7202 
and ¥,=%/3577 This, however, does not indicate that the skew- 
ness of the logarithmic curve is limited. Rather it shows that X 
is not a satisfactory measure of skewness, so far as the logarith- 
mic curve is concerned. For any measure of skewness should 
characterize the skewness of a curve without ambiguity. and X 
fails to do so in case of the logarithmic curve. For instance. 
when we say that a certain logarithmic curve has X =.32. we 


may mean either a logarithmic curve with a, =.6@or one with 
%, 756. OU. 


When the logarithmic curve is only moderately skew, X ap- 
proximately equals %; /&. This can be shown as follows: Letting 
h*eq@-1, we have 


wie 
2)2 
xe Leth) - 


24.25 ,4,5 ps.... 


og 1 
3 
and %, = Sh+h. 


Hence, tor small |4| and hence small lv;| 2X approximately 
equals x, /2. For instance, when ¢#y=.2. X =. OPPLwhich 
is approximately ¥,/7=, 7 

We may mention here that for the Pearsonian type II] curve. 
the relation Y= %; JZ always holds. In fact, it appears from 
Table II that the type III curve and the logarithmic curve aré 
very similar for small lv, |. But the differences between them are 
already pronounced for =f , as we can see from Table III. 
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TABLE Ul 


Ordinates and Areas of the Logarithmic Curve and the 


Pearsonian Type IIT Curve 


Ordinate at x 








Log. Curve 


Type IIT 


0003 
0020 
124 
049] 
1337 
2587 
3692 
399] 
300 
2267 
1242 
568 
0217 
0072 
0020 
WM 
0002 


mezO o=i 42.2 





Limit to x 
Log. Curve 


Area from the Lower 


Type III 





1 
' 
j 
f 
t 
' 
‘ 
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1.0 
1.5 
2.0 
2.5 

3.0 
3.5 

4.0 
4.5 
5.0 
5.5 


6.0 
6.5 
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TABLE III 


Ordinates and Areas of the Logarithmic Curve and the 
Pearsonian Type III Curve 


m=O 


o=z/ 


Ordinate at x 


Log. Curve 


0084 


Type III 


= a 
Area from the Lower 
Limit to & 

Log. Curve Type III 
.0009 0 
0259 0190 

1398 1429 
3442 3528 
5624 5065 
7363 7345 
8520 8488 
.9210 9182 
9590 .9576 
9783 9788 
9895 .9897 
.9948 9951 
9977 .9977 
.9987 9990 
.9993 .9995 
.9997 9998 
.9998 .9999 
.9999 





— ee 
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7. KURTOSIS 


Another important characteristic of a frequency curve is 
kurtosis measured by é (44-3) or simply by 7 = 4,-9, which 
equals zero for the normal law of error. If the mean and the 
standard deviation are taken to be the origin and the unit, respec- 
tively, then usually the frequency of a curve in the vicinity of 
the mean is in excess or in defect to that of a normal curve ac- 
cording to whether 17 is positive or negative. A curve is said 
to be platykurtic, if y»@. It is leptokurtic, if 7<«O. Thus, 
the logarithmic curve is always platykurtic. for its 7 is 


n«(w-Lhlw% Sw *+6u+6) 
(10) 


or n= @*+ 2w%s Iwt-6 


and w>/ Since the logarithmic curve has only three parameters, 
there exists a functional relationship between its skewness and 
kurtosis. This relationship is given through the parameter q by 
(9) and (10). We may further deduce the following relations 
from these two equations: 

7 is always greater than ? “ Z . This follows from the fact 
that 2-J3a72(w-L)2w4 Iw?)>0. 


For |%, < 6.44, we have 2%,* > y , since 
34,°- 7p = (w-L)-wI+6w+6/ > O holds, provided 
w+¢2.G. which corresponds to | ¢,| < 6.44. 

For |#,|<2275 . we have La >. since 
Lag-n=(w-l1)(-wt- wrt 2w+2)>O 
holds provided a9 < £4, which corresponds to |v3|\< 21 3. 

Since practically the value of |¢,| can hardly reach 6, 44or 
even /J5, the relations just stated hold for all practical instances. 

The relationship existing between 7 and #, is sometimes 
used as a criterion for applying the logarithmic curve to observed 
data. We shall discuss this point in section 9. 
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8. POWERS, ROOTS AND PRODUCTS OF THE LOGA- 
RITHMICALLY DISTRIBUTED VARIABLES 


If x is logarithmically distributed and has @” as its lower 
limit. We (x-a)"%s also so distributed, & being any constant. 


This follows from the fact that if ze = 2 lo Zis normally dis- 


tributed, so is hxaS lag 28 -2. From the frequency function of 


xz. Fz). given by (1), we find at once the analytic expression 
of the frequency distribution of W to be 





t otag, OH) aw 
raw (er aw. (11) 


We have learned from the preceding sections that a logarithmic 
distribution represented by (1) with larger c has greater varia- 
bility, skewness. and kurtosis. Thus, if 4 “> 7, the variability, 
skewness, and kurtosis are greater for W than for g. On the 
other hand, if A+ 7, the distribution of zx has greater varia- 
bility, skewness, and kurtosis. 

If the logarithmically distributed variables ¥,, 22, ---, Zp 
are independent and have for their lower limits, @,, @,, --+@y» 
then the product 


Y= (2,-2,)(2,-2@,)---C2,-@,) 


is also so distributed. ‘This follows from the fact that if 


27a 


a,=8 log a w,°5 Z Leg Zy~ “: 


=o <2... iy tag Tuts 





are each normally distributed and are independent, their sum also 
obeys the normal law of error. 
Since the variables are independent, the frequency distribu- 


Ne 


a TT 
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tion of these 77 variables is represented by 


Fy (2) Fj (@,)-». FE (@,) Az, 2, Az, (12) 





oie [- $4 ‘ 
where 77 (z;)= e “SF UY B; 
V2TI Og (2g - @; 7 


Substituting z,-a, = Y//z,-¢,)---(z,,-@,,) in (12) and integrating 


the resulting expression with respect to 2, ++-+, Re successively 
over the respective ranges, we have the distribution of Y as 


a a 
Cf +6,7+..-404 ) [wags] (13) 
lan Jetocte 2k oa AY. 


- ww 


: . a2 
Since the sum. c,*+ c,*+ tee 7 . is greater than any c, , 


the distribution of Y has greater variability, skewness, and kur- 
tosis than that of each individual variable. 


9. NUMERICAL APPLICATIONS 


Many methods of fitting a logarithmic frequency curve to 
observed data have been proposed. But only the method of mo 
ments will be considered below.* 


The method of moments is very simple to apply. It consists 
of placing the computed moments in equations (2) and then 
determining the parameters by solving these equations by formulae 
(3) and (4).¢ The only step of computation which requires 
some time and care to obtain accurate results is the solution of 


* Among other methods of graduating the logarithmic frequency dis- 
tribution, the graphical method proposed by Kapteyn and Van Uven is 
especially useful. For a description of this method, refer to their paper on 
“Skew Frequency Curves in Biology and Statistics, 2nd Paper”. 

tIn his paper, “On the Genetic Theory of Frequency”, Wicksell also 
showed the application of the method of moments to the logarithmic fre- 
quency distribution. However, he found the parameter “a” first and then. 
proceeded to obtain “log b” and “c”. 
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the cubic, 
Wiw)=w 4u2er*-(o6,2 4)=0. 


Hence, it is desirable to have a table which will provide an approx- 
imation of the required root of this cubic for a given ¥, . ‘I’hen, 
the root can be approximated to as great a degree of accuracy 
as we wish by applying, for instance, Newton's method. ‘That is 
why Table IV is constructed. Practically, after we obtain an ap- 
proximate value of w@ from Table I'V, one single application of 
Newton’s method will almost invariably suffice to give us a value 
of w accurate to four decimal places. In Table IV. values of ¢ 
corresponding to given values of w@ are also provided to serve 
as a check to our computation of ¢ by formulae (4). 


TABLE IV 
Table Facilitating the Solution of the Cubic 
wWIs Pwt- (afr 4)eO 
ed | 
3 0 
1.01 3010 L000 ‘ 1.6623 4807 
i.02 4271 1407 i 1.0991 4889 
1.03 5248 1720 , 1.7356 4969 
.6080 1980 1.7717 5046 
.6820 .2209 = 1.8075- 5122 
7495+ 25 > ‘ 1.8429 5196 
8122 2602 . 1.8781 5269 
8712 i573” mt 1.9129 5340 
9270 .2936 ‘ 1.94754 5410 
.9803 3087 ‘ 1.9819 5478 
1.0315- 3231 i 2.0160 5545+ 
1.0808 3366 ‘ 2.0499 5611 
1.1285+ 3496 a 2.0836 5675+ 
1.1749 3619 ’ 2.1171 .5738 
1.2200 3739 F 2.1503 580] 
1.2640 3852 ‘ 2.1835 - 5862 
1.3070 3962 . 2.2164 5922 
1.3492 .4068 , 2.2492 5981 
1.3905- 4171 ) 2.2818 6038 
1.4311 .4270 d 2.3143 .6096 
1.4710 4366 . 2.3467 6151 
1.5103 4460 j 2.3789 .6207 
4550 d 2.4110 6261 
4638 d 2.4430 6315+ 
4723 J 
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To illustrate the use of Table IV and to help in studying the 
application of the logarithmic frequency curve, we take the dis- 
tribution of the weights of 1,000 female students from the 


“Synopsis of Elementary Mathematical Statistics”* by Miss B. L. 
Shook. (See Table V.) 


The mean, standard deviation, and skewness for this distri- 
butionf are 


m=z 118.74 lbs. 
= 16. 91752 1bs. 
% = . WO4ZA 


To compute w, we find from Table 1V that for ¢, «. 976424 


@ is approximately a), = 1/0. For a better approximation, we 
apply Newton’s method : 


Vw, ) ww, + Fag* (eZ + 4) 
QQ ye Oa 
° V%,) . Fay* + ow, 
.COFSG6 


=1.10- Va2e, =1.10-. 000743 


= 1.099 <57 


By formulae (4), the parameters Cc , 5 and @ are found to be 


C=.507 627 
b= 51, 2160 Ibs. 
a= 65, 0423 Ibs. 


* Annals of Mathematical Statistics, Vol. I, No. 1 (1930), p. 39. 
¢ Sheppard’s corrections have been duly applied. 








TABLE V 


Observed and Theoretical Distributions of the Weights of 
1,000 Female Students 
(Original Measurements Made to Nearest 1/10 Ib. 










Theoretical 
Type III 
Distribution 

By Areas 


Class 
Limits 
( Pounds 





















70- 79.9 0 
80- 89.9 4 
90- 99.9 102 
100-109.9 238 
110-119.9 250 
120-129.9 184 
130-139.9 111 
140-149.9 59 
150-159.9 29 
160-169.9 13 
170-179.9 6 
180-189.9 3 
190-199.9 1 
200-209.9 0 
210-219-9 


Total 


Knowing c . & and @. we obtain the geometric mean and the 


mode : 


mg = mq = 116, 2583 Ibs. 


Using these parameters, the theoretical distribution of the 
weights of 1,000 female students has been computed and is shown 
in Table V and Figure II. The fit of the logarithmic distribution 
to the observed data is, indeed, excellent.* The lowest possible 


weight of female students, according to the theoretical distribu- 
tion, is 65.04 pounds, which is just about what one would expect 
after examining the observed data. 

Miss Shookf used the type III distribution to fit the same 
set of observed data and gave the result as shown in the last col- 





* Grouping the first three classes into one class and the last six classes 
into one class, we apply the X* test for goodness of fit and find that the 
probability to get a worse fit is .70. 

t+ Annals of Mathematical Statistics, Vol. I, No. 3 (1930); p. 242. 
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umn of Table V. The fit is not as good as that given by the log- 
arithmic distribution, especially in view of the fact that the type 
III curve fixes the least possible weight at 84.09 pounds, while 
as a matter of fact there are two students whose weights are 
below that limit.f 

From the standpoint of the method of moments, a criterion 
for the logarithmic distribution to fit a set of observed data is 
that yn =«,- 3 computed directly from the observed data must 
be approximately the same as the theoretical 1 computed from 
formula (10). This criterion, however, does not seem to work 
in practice. For instance, for the distribution of the weights of 
1,000 female students, the theoretical 7 is 1.7419, while the ob- 
served 7) is 2.4536. But in spite of this fact, the observed distri- 
bution, as we have seen, is very satisfactorily fitted by a loga- 
rithmic distribution. 

Another criterion is to require the observed moments about 
the lower limit “a” to satisfy approximately the recurring relation 


, 2s-1 c* 
Hs = 5 Ms 


for s= 4 . This criterion is approximately fulfilled by the distri- 
bution of the weights of 1,000 female students, for which we have 


My = 14727910" 


“i 
bet, « 146696 +107 
and uj /be® yn « 1.0040. 


The fact that a set of observed data may be satisfactorily 
graduated by the logarithmic distribution but fulfills only the 
second criterion may be explained on the ground that the com- 


tIn fact, since the finite limit of the variable for type III curve is 
777 ~ 4 o and for the logarithmic curves 77 - ato, the finite limit 
is always greater in absolute value for the logarithmic curve than for the 
type III curve. 
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paratively wide discrepancy between the observed and theoretical 
frequency in the classes near the lower limit maks a great differ- 
ence in the fourth moment about the mean but does not make 
much difference in the fourth moment about the point @" . 


PART II 


THE SEMI-LOGARITHMIC CORRELATION SURFACE 


Suppose that the correlation surface of the functions, x « t(u, Vv) 
and y<«gluv), is a normal correlation surface and each has its 
mean as the origin and its standard deviation as the unit. Then, 
the probability that values of % will lie between x and x+#@x 
and values of y between y and y +@yis 


gba [ x? érayry*] 


P(x yddy = ons e addy, (1) 


It follows that the probability that values of a will lie between 
uw and w+@u and values of v between v and vedv is Flu,v) 
adudy given by 


- a 2 
4 4XFr® [rt 2npeg*| or of 
—_t__ u wl duav, 
27 /1-ré2 , # 0 _— 
a 


V 


(2) 


Fu, v) is, therefore, a generalized correlation surface of two 
variables, deduced by extending the method of translation for 
treating frequency distributions of one variable. 

It is clear that in this general form the correlation surface 
represented by *%,v) is of little practical use, on account of its 
complexity. Now a natural simplification suggests itself. That 
is to take x as a function of « only and y as a function of v 
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only. By virtue of this simplification, 77, v/) becomes 


a a saiille 
a a td fF GS at ag (3) 


2177/)-r? au av 


which is a great deal easier to handle than before. 
Professor Wicksell has made use of (3) for the special case 
where, in our notations, x and y are 


asd log = 





y*3 9 5 


which leads to the so-called “logarithmic correlation surface’.* 
The surface possesses the property that its marginal distributions 
as well as the distributions of uw for given values of v and dis- 
tributions of v for given values of uw are all logarithmic fre- 
quency distributions. 

Presently we shall study another case for which 


un 
ze Sf 
Y= ! tog <8 


‘The correlation surface /% (i, v) given by (3) then becomes: 


1 _ tal 
war i(S2/e 5 top : 48, (Liag $8 V-d *] 
Flav): i caiscllaieneieendisirataenn anit tonite, (4) 
' 2 Ac(v-a) /1-r2 





*In Wicksell’s paper, “On the Genetic Theory of Frequency”, the 
theory of the logarithmic correlation function is developed. In his two 
successive papers quoted in the Bibliography of this paper, the original 
theory is extended and the application of the extended results illustrated. 
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which may be appropriately called a semi-logarithmic correlation 
surface. We shall investigate its marginal distributions, moments 
and regression curves of the characteristics. 


1. MARGINAL DISTRIBUTIONS 
Now, we shall first find the distribution of the marginal totals 
of uw. This can be, of course, accomplished very easily by inte- 


grating 7/«,v) with respect to vy over the range from @. to 
infinity. The result is: 


o -f5, (u-4)* 
JPtuvtv- 1 Eel 
@ 


a17 


. (5) 


Thus, the marginal distribution of « obeys the normal laws of 
error. 

Similarly, if we integrate 77, v) with respect to a over the 
range from - 0 to o , we find at once the marginal distribution 
of v as follows: 


~ CZ v-2 
00 =s(loo —= 
[Flu,v/du- 2 oF 6 ye (6) 
= 0 VET C(v-a) 


which is, clearly a logarithmic distribution and. therefore, has all 
the properties and characteristics discussed in Part I. Hence, the 
semi-logarithmic correlation surface is characterized by the fact 
that one marginal distribution is normal, while the other is loga- 
rithmic. It is needless to mention that this does not constitute a 
sufficient condition for a correlation surface to be a semi-logarith- 
mic correlation surface defined by (4). 


2. MOMENTS 
The moment, Ay , of the semi-logarithmic correlation sur- 
face about the point w=é and v=@ is given by 


Aj Lf to Vre) ‘Fi, vJau av 


° . '2o2 2 . k 7 k - z* (7) 
tad se 
=A‘b4e SL iger) L& a 
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ao 4 
a See A/ : . 
where /dt e “ds if X is even 
oak) 
x . 


=O, if & is odd. 


Using relation (7), we can easily calculate the following six 
moments about the mean of u , 7, and the mean of v , m,: 


Mig? ™%, ~ v=O 


H~20* * 


¢* 
4g, = m,-(be “saJeO 
hi, « bee (eo 1) (8) 
‘az 
hyp = B50 8 (0 e522) 


c? 
hy= rAcbeé 


Now, we want to solve these equations for the six param- 
eters. As before, we let w= e°*and write x. = hs Z 

. p Fe : a3 = oz 1H “op 
‘ Again, we have w as the only real root of the cubic: 


a 37- (ae +4)20 , (9) 


The six parameters of tne semi-logarithmic correlation surface 
can be written as: 


= 7, 


A =(L 29 “Oy 


c=(log w/t 


2 i 4 
(5) Ba)"2, (E22) e, (10) 





i 
- my-(sy)°o,« ~~ a 


if 
fx AL; (a-i)* 


I, F, (log w}* 


a 
LO ODL ALON, 
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which furnish -us a simple practical method for determining the 


parameters of the semi-logarithmic correlation surface for ob- 
served data. 


3. REGRESSION OF THE MEAN 
First, let us observe that the function 77% v) may be put 
into the following forms: 


L og ‘Se Pr V-a 4 
Ha J; ae ay ral e lop 7% 
v alee 
- 277/1-r2 “aia 
f we  u-v F 
— Re. wy “BF =3)|2 ws. “| 
2n/1-r? cfv-a)ar 


Hence, the distribution of w for a particular array of y is normal: 


nga Lil. £ aug a" 
f ° “t1re?)\ A ¢78 SB HD 
Jem a ff-r2 at} 


and the distribution of v for a particular array of w is loga- 


6 (a, v)= 


rithmic: 
2 
o wi a2 lag 58 ; an 
Ear ‘iF ro C(a) - 
(12) 
1 v-2 2 
1 ‘ ~2c2(4-r%) log bet Zt ; 


“7 Ji-r? civ-e) 


To find the mean of uw for a particular value of v , we mul- 
tiply 9 ~&,v) by « and integrate the resulting expression with 
respect to uw over-the range from -o to w . 


as - A ki 
faf otal HS log GSeer (13) 
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which is the regression equation of the mean of ~ on Vv and 
may be called the logarithmic regression equation. 


Similarly. the regression equation of the mean of v on uw is. 
found to he: 


a a- @ 
0= | vé,t,vJavebe" Garr .. (14) 
da 


which may be named exponential regression equation. 
Observe the following points: 


(a) The regression curves (13) and (14) intersect at the 
point 


ae rs age 


2 
vzbe 20m... 


(b) When r#2 . the curves become two straight lines: 


Gel szmy 

~ ¢? 
YV=bet +a=m™ ie 

which show that @ is independent of v and V is independent 
of «. We can also see this from the expression 7% v), which 
becomes 


-¢)< ) 

Flavngl eG) 1 hb BY" 
 ferA ‘ler clV-a) 

when rzO . ‘This is the condition for independence of a and 

Vv ina probability sense. 
(c) When r=/ , these two regression curves coincide, This 

signifies that there exists a complete functional relationship be- 

tween uw and v, namely: 


u-*_ i v-2 
“= *Fayry 





— 
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(d) As we have learned from the studies on the normal 
correlation surface, 7 is the coefficient of correlation measuring 
i ill : “u-7 1 
the linear relationship between x= 3 and y=3 fog(v-a), 
Mhus, it is also a measure of relationships (13) and (14) existing 


between w and v. If we note that 7 may be written as 


1 
AL, (ec. 12 





J Py ¢ 
(15) 
6 z 
s Abn (L+§ S's ? ZF 
Oxy Gy "3. 


we sec that 7 is always greater than w,/a@, 9, , which would 
be the coefficient of correlation measuring the linear relationship 
between eg and v. if we treated the correlation surface of ce and 
Vv as being normal. 

The smaller the value of ¢ and, . the smaller the differ- 
ence hetween 7 and zz, /Ou G,. In fact, we can show, as we 
did for one variable case, that.as ¢ goes to zero the semi-loga- 
rithmic correlation surface approaches the normal correlation 
surface. 

Incidentally, we may remark that the expression (15) is 
convenient for computing 7 . 


4. REGRESSION OF THE MOMENTS 

Using the well-known formulae for the moments of the nor- 
mal curve of error about the mean, we can find at once the s-/4 
moment of 6%, v/about its mean: 


Ma. wfe (a-2)°6 (u,v) du (16) 
77) Nt-r®) We if ¢ is even 
=O if ¢ is odd. 


This is the regression equation of the s-‘/ moment of « about 
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which is the regression equation of the mean of w on Vv and 
may be called the logarithmic regression equation. 

Similarly, the regression equation of the mean of v on «u is. 
found to: be: 












@ a- 4 
7= [vee vidvebe © GF) (14) 
a 


which may be named exponential regression equation. 
Observe the following points: 


(a) The regression curves (13) and (14) intersect at the 
point 





al= 


rs age 



















F£° 
vzbe~+a-m,. 


(b) When 7-*@ . the curves become two straight lines: 


i a, 
id 
Vzbef +@=7, 





which show that @ is independent of v and V is independent 
of uz. We can also see this from the expression 77% v), which 


becomes 
1/4-¢)? - v-2 )* 
Flu, v/ aad Pe ie 1 able SY) 
fer ' fer cla) 
when rzO . ‘This is the condition for independence of « and 


V ina probability sense. 

(c) When r=/ , these two regression curves coincide, This 
signifies that there exists a complete functional relationship be- 
tween uw and v, namely: 


w-*_ i v-2 
“~ *F4oy 9"- 
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(cd) As we have learned from the studies on the normal 
correlation surface, 7 is the coefficient of correlation measuring 
the linear relationship between x= 3 and y=3 fog(v-a), 
Thus, it is also a measure of relationships (13) and (14) existing 
between aw and v. If we note that 7 may be written as 

24) 
44 (e°-1) 


fa «<a 
I, Ay c 


< 6 Zz 
AA (L+§ ~ “* + gt DF 


7% 
Ny Fy "3: 


we sec that 7 is always greater than w,/a@, 9, , which would 
be the coefficient of correlation measuring the linear relationship 
between ez and vif we treated the correlation surface of ce and 
Vv as being normal, 

The smaller the value of ¢ and, . the smaller the differ- 
ence hetween 7 and 4s, [Ou g,. In fact. we can show, as we 
did for one variable case, that.as ¢ goes to zero the semi-loga- 
rithmic correlation surface approaches the normal correlation 
surface. 

Incidentally, we may remark that the expression (15) is 
convenient for computing 7 . 


4. REGRESSION OF THE MOMENTS 

Using the well-known formulae for the moments of the nor- 
mal curve of error about the mean, we can find at once the s-/4 
moment of 6%, v/about its mean: 


” ur [ Ca-2)"6 Cu, v) au (16) 


4) Ve if ¢ is even 


alii VN 
wey -r 
HD 

=O if s is odd. 


This is the regression equation of the s-‘4 moment of g about 
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the mean on v. It follows that the s-7A4 standard moment of w 
for a given value of v is: 


M 


“seu” Fy 8/2 
aia 
S/ 
.—_— » if ° 
2%e(Z)! 1 S$ is even (17) 
=~, if s is odd. 


Again, by the formulae given in Part I for the moments of 
the logarithmic distribution, we calculate the s*?moment of 8 (u,v) 
about the point @° : 


4 o S 
M,, = [ (va) Q(uv)adv 
4 at s%ef-r7 
7 - 


scr Ss 
=b%e A 


> (18) 


And the regression equation of the s*? moment of v about the 
mean on «& is: 


oo 
Mey =f 0-0)" & Cu, ede 


(19) 
u-t, sct(7-r4) $ k Ath-LJc*(t-r*) 
high’ "+ aE 1) (ge ———— * 
keO 
The »‘’ standard moment of v for a particular value of «& is, 


therefore, 
/.. 
4 sy m wy 
a:v i 2 
4-L)c*(1-r?) 
be (20) 


fe ct(t-77 ) 82 ; 


Having obtained the expressions for the regressions of the 
moments of one variable on the other, we shall now proceed to 
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discuss the scedasticity, clisy and synagic* of the semi-logarithmic 
correlation surface. 


5. SCEDASTICITY 


From formula (16), we have the regression of the second 
moment of « about the mean on v: 


My = A L-r4) (21) 


which is the same as in the case of the normal correlation surface, 
except that “ now does not measure the linear relationship be- 
tween uw and vy. Since (21) is free of v, the semi-logarithmic 
correlation surface is homoscedastic, so far as the variable w is 
concerned. 

From the standpoint of estimation, we may also interpret 
expression (21) to mean that when we estimate the mean value 
of « for a particular value of v, the error of estimation will be 
reduced if we use formula (13) instead of the mean of the mar- 
ginal distribution of uw. The standard deviation of the marginal 
distribution of w is A, while that of (13) is only /M, » Af-r? 
as shown by (21). 

The second moment of v for a particular value of ew is given 


by (19): 


2. Qrppthl eff-r% 
M,,, = bte 20 Gh Abd | eL 7 7] (22) 


which is not independent of « . So, the semi-logarithmic correla- 
tion surface is not homoscedastic for v. Actually /M,,, the stan- 
dard deviation of the distribution of v for a given w , increases 
with uw. 

However, the relative dispersion or relative error for the 

*The term “synagic” was used by S. D. Wicksell to mean the re- 
gression of the kurtosis. (“The Correlation Function of Type A, and the 
Regression of its Characteristics”, Kungl. Svenska Vetenskapsakademiens 


Handlingar, Band 58, Nr. 3; Meddelanden fran Lunds Observatorium, Ser. 
II, Nr. 17, 1917) 
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distributions of v for different values of uw is a constant, namely: 


a ft 


V-@ 


peta-r9_ sf (23) 


‘hus, by using formula (14) to estimate the mean value of v 
for a given value of a instead of employing the mean of the 
marginal distribution of v. we reduce the relative error of esti- 
mation, for the relative error of the marginal distribution is 
(ec. 1)£ The reduction of relative error is much pronounced 
when -’ is large. In fact, the greater 7 is, the greater the reduc- 
tion of relative error and the better the estimation. Hence, ~ 
measures the degree of relationships (13) and (14) between « 
and v. 


6. CLISY AND SYNAGIC 

Now, we shall study the clisy and synagic of the semi-loga- 
rithmic correlation surface or the regression of the skewness and 
kurtosis of one variable on the other. 

The skewness and kurtosis of any distribution represented 
by 9 @v/as measured by~ z:4 and NY, = %, JS are, of course, 
equal to zero, since it is a normal distribution. But the skewness 
and kurtosis of any distribution of v for particular values of w , 
according to formula (20), are given by: 


iia (e c%t-r%) f jt (e eff1-r 4, 2) (24) 


ae feck OD 4) (eH UCETY gfeUt-r boo LY 5) 25) 


which are two constants. Since the skewness and kurtosis of the 
marginal distribution of y are given by /e° °t)4(le°*+ 2) and 


é &. kk? . 
(eo Les 3e~" 4 50°+6) , respectively, we may say that 


~ 





—_—_ 
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the distribution of v for each array of « has smaller skewness 
and kurtosis, and is, therefore, closer to the normal distribution 
than the marginal distribution of v. And it is more so, when 7 
is near unity. 


7. REGRESSION OF OTHER CHARACTERISTICS 

In this section, we shall give the regression of other charac- 
teristics, such as the median, the geometric mean, the mode, the 
points of inflection and the finite limit. 

The regression equation of the median and the mode of « 
on v are, of course, the same as that of the mean of w ony, 
because Gla vis normal. The points of inflection of G (é, v/are 
points one standard deviation, i.e.,,//4, to the left and the right 
of the mean, as this is again a well-known property of the normal 
distribution. 

The regression equation of the median and the geometric 
mean of v on & is given by 

a-f 


ff —— 


a 
Ma:y * "giv = de “e 


or Fs Lagf- —— ad = f (26) 


which differs from the regression equation of the mean or the 

median of « on rl, only in that the constant factor 7 is on the 

left member of equation (13) but is on the right member of (26). 
The mode of v for special values of aw is 


M7). « 0e el $4) -c401-r), : (27) 


The regression equations of the points of inflection of v on 
« are given by 


pecemmpeannatnanen 
; fe [fe ari za | 


> At a ale 


a: V7 


which are not free of w. 
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Finally, we may add that the finite limit of any distribution 
of v for a particular array of uw is the same as that of the mar- 
ginal distribution of v . 


8. AN ILLUSTRATION 

For illustrating the application of the semi-logarithmic cor- 
relation surface, we take the correlation table of heights and 
weights of 11,382 school boys between 5 and 14 years of age in 
Glasgow from L. Isserlis’s paper, “On the Partial Correlation 
Ratio”.* We shall treat the height as the variable uw and the 
weight as the variable v. Thus, the marginal distribution of 
heights is supposed to be normal, while that of weights is sup- 
posed to be logarithmic. 

Letting the class marks, 49 inches and 56 pounds, be the 
origins of w and Vv, respectively, and the class intervals be the 
respective units, we calculate the moments of this correlation 
surface :* * 

mM, =~ 511861 class intervals 
Gg, = 1.7631 class intervals 
~, >= 877 
bgg 4. JOIS 
m, =-.205412 class intervals 
=2I5I7E61 class intervals 
G3 = .IPID 
Gog =S 1221 
A, =4.2OIB7S 


from which we deduce the following parameters by formulae 


(10): 





-~ 51186] class intervals 


Y= 

A= 1.7631 class intervals 
w= L,0379 

e- .1929 

a «-13.45 class intervals 
£213.00 class intervals 
r= 9340 


* Biometrika. Vol. XI, 1915, p. 65. 
* * Note that these numerical results differ somewhat from those given 
by L. Isserlis, because we have applied Sheppard's corrections to the raw 
moinents. 
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ee 


Weight (Pounds) 


TABLE VI 
Correlation Table of Heights and Weights of 11,382 School Boys 
between 5 and 14 Years of Age in Glasgow 
(Original Measurements of Heights Made to Nearest Inch: 
Original Measurements of Weights Made to Nearest Pound) 
Height (Inches) 


21335 8 FF $3828 8 YB gy 
5051893 8 S$ <¥eegnagt esse 
24- 28 4 9 2 1 16 
29- 33 +’ 4 62 @ 3 1 136 
34- 38 16 220 414 72 6 728 
39- 43 1 3 51 617 697 95 1 1 1476 
44- 48 1 7 122 875 603 38 8 1 1655 
49- 53 4 12 249 988 411 33 5 4 1706 
54- 58 1 3 1 17 #436 905 171 11 4 3 1552 
59- 63 1 1 39 630 568 5i1 6 1 1297 
64- 68 1 8 161 621 206 Ss 2 F 1004 
69- 73 1 35 374 340 24 2 776 
74- 78 3 106 335 7% 5 525 
79- 83 2 22 1200 93 4 #1 242 
84- 88 1 8 32 87 8 2 138 
89- 93 _ -— = mm! 66 
94. 98 a! a oe 37 
99-103 5; ti 3 19 
104-108 1 5 1 7 
109-113 1 1 
114-118 0 
119-123 1 1 


Total ~ 8 72 350 1193 1914 2178 2196 1913 1115 361 69 13 11,382 


With these parameters, the correlation surface of heights and 
weights is determined. Now, we shall examine the regression 
curves of this correlation surface. 

Inserting the computed parameters in formulae (13) and 
(14), we obtain the regression equations of the mean height on 
weight and the mean weight on height. In Tables VII and VIII, 
we have the mean heights for specified weights and the mean 
weights for specified heights, We see, from these tables and from 
figures III and IV, the agreement the theoretical and observed 
results is very excellent. In some extreme classes the deviations 
of the observed values from the theoretical values are more pro- 
nounced. But these classes comprise only a small fraction of the 


total number of cases. 
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Now, we go further to investigate the scedasticity of the 
correlation surface of heights and weights. According to the 
theory, for any particular weight the standard deviation of heights 


should be a constant and equal to 1//- pat = 1.8393 inches. 


This is much less than the standard deviation of the marginal dis- 
tribution of the heights, which is 5.2893 inches. That 1.8893 inches 
is quite close to the observed standard deviations is shown by 
Table 1X and Figure V. 

The theory asserts that the dispersion of weights is not the 
same for different heights. But for all arrays of heights the 
relative dispersion or relative error of weights is independent of 
heights. 


TABLE VII 


The Mean Heights for Specified Weights 























Weight ( Pounds) Observed Theoretical 





24- 28 34.4 33.2 
29- 33 36.5 36.4 
34- 38 39.3 39.3 
39- 43 41.8 41.9 
44- 48 44.0 44.2 
49- 53 46.4 46.4 
54- 58 48.5 48.3 
59- 63 50.5 50.2 
64- 68 51.9 
69- 73 53.5 
74- 78 55.0 
79- 83 56.4 
84- 88 57.8 
89- 93 59.1 
94- 98 60.3 
99-103 61.5 
104-108 62.6 
109-113 63.6 
114-118 64.7 

65.7 


119-123 





a ~~ ~— 


a 
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FIGURE IT FIGURE IZ 
REGRESSION CURVE OF MEAN REGRESSION CURVE OF MEAN 


HEIGHT ON WEIGHT WEIGHT ON HEIGHT 


= OBSERVED VALUE "OBSERVED VALUE 


_* 6 £ sy 


MEAN HEIGHT (INCHES) 
zs 


yu 8 
MEAN WEIGHT (POUNDS) 


$28 ARRSS ZIT RA 
WEIGHT (POUNDS) HEIGHT (INCHES) 


er 9 Sn — ~~ 
ee 


FIGURE V FIGURE 7 
CURVE OF SCEDASTICITY CURVE OF SCEDASTICITY 
j OF HEIGHT ON WEIGHT OF WEIGHT ON HEIGHT 


» OBSERVED VALUE xQBSERVED VALVE 


CO OF HEIGHTS (INCHES) 
6 OF WEIGHTS (POUNOS) 


TEREERELE Ee sesages 
WEIGHT ( PouNDS) HEIGHT (INCHES) 
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TABLE VIII 






Height (Inches) 


Observed Theoretical 

















30-32 29.8 26.0 
33-35 32.5 30.0 
36-38 36.4 34.4 
39-41 39.3 
42-44 44.7 
45-47 50.8 
48-50 57.5 
51-53 64.8 
54-56 73.1 
57-59 82.1 
60-62 92.2 


63-65 


According to formula (23), for any specified height, the relative 
error of weights is 7.6%, which is much smaller than the relative 
error of the marginal distribution of weights, which is 


(e<? F. 1957. 


Both the theoretical and observed absolute errors or standard 
deviations of weights for specified heights have been calculated 
and are shown in Table X and Figure VJ. The agreement between 
the theoretical and observed dispersions is not as good as for the 
regression of the mean weight on height. It should be noted here 
that theoretically the standard deviations of weights for heights 
over 76 inches are greater than the standard deviation of the 
marginal distribution of weights, which is 12.8905 pounds. 

In interpreting the standard deviations of weights for par- 
ticular heights, we must bear in mind that the distribution of 
weights for any given height is not normal, but logarithmic. 
Hence, a proper interpretation of the dispersion of weights for a 
given height can be made only with reference to the skewness, 
masured by the third standard moment of weights, which, accord- 
ing to the theory, is a constant for all different heights. The 
theoretical third standard moment of the distribution of weights 
for any given height, as we shall see later, is approximately .2. 





Weight ( Pounds) 
24- 28 
29- 33 
34- 38 
39- 43 
44- 48 
49- 53 
54- 58 
59- 63 
64- 68 
69- 73 
74- 78 
79- 83 
84- 88 
89- 93 
94- 98 
99-103 
104-108 
109-113 

114-118 
119-123 
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The Standard Deviations of Heights for Specified Weights 


TABLE IX 
Standard Deviation of Heights 
Inches 
Observed Theoretical 
3.52 1.89 
2.40 1.89 
1.91 1.89 
2.42 1.89 
1.91 1.89 
2.07 1.89 
2.04 1.89 
1.81 1.89 
1.87 1.89 
1.79 1.89 
1.92 1.89 
1.95 1.89 
2.18 1.89 
2.01 1.89 
1.86 1.89 
1.62 1.89 
2.34 1.89 
0 1.89 
or 1.89 
0 1.89 
TABLE X 


The Standard Deviations ot Weights tor Specified Heights 
Standard Deviation of Weights 


Height (Inches) 
30-32 
33-35 
36-38 
39-41 
42-44 
45-47 
48-50 
51-53 
54-56 
57-59 
60-62 
63-65 


Observed 
4.6 
4.5 
4.0 
3.5 
3.6 
4.2 
48 
5.9 
6.3 
8.4 

12.5 
14.8 


Pounds 


Theoretical 
28 
a1 
3.5 
3.8 
4.3 
4.7 
5.2 
5.8 
6.4 
7.1 
7.9 
8.7 
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Thus, from Table II in Part I, we find that the probability that 
any weight will be at most one standard deviation above or below 
the mean weight for a given height is .6839 instead of .6826, as 
in the case of the normal distribution. The difference between 
.6839 and .6826 is slight but should not be overlooked. More- 
over, the difference would not be so small, if the skewness were 
larger. 

Another thing we must observe is that since the standard 
deviation of weights for a given height increases with height. the 
probability that for a given height the weight will differ from the 
mean weight for that height by, say. at most one pound is not 
the same for all different heights. although the probability that 
for a given height the weight will differ from the mean weight 
for that height by at most one standard deviation is the same for 
all different heights. The former probability is greater for smaller 
heights. 

The agreement between the theoretical and observed clisy and 
synagic is, of course, not expected to be close. Theoretically, the 
distributions of weights for specified heights should all have 3.23 


and 7 = a. -3=.99 Five observed values of ~y and 7, are 


shown below: 









Observed Kurtosis 
of Weights 


Observed Skewness 
of Weights 






Height 
(Inches ) 
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The rather large deviations of the observed 7, in the first 
class from the theoretical 7, and the observed ~, | in the last 
class from the theoretical¢, , may be accounted for by the fact 
that only 350 and 69 observations are included in the first and 
the last classes, respectively. 

The observed marginal distribution of heights is very sym- 
metric but is markedly leptokurtic, since its ¥, is about 2.5093. 
Hence, the fit given by a normal curve is not quite satisfactory, 
as we can see from Table XI. 

The observed marginal distribution of weights is quite skew 
and platykurtic. As shown by Table XII, the agreement between 
the observed distribution and the theoretical logarithmic distribu- 
tion is not very close. 


——eoEoOoEe—eESEEEEeeeSE_le™_™_™_ON™_O™ON™EOE™7E=™E_O™“SE=ESER=E=E=E=—NE™NTEE™ENENNOSESS——==" 


TABLE XI 


Relative Frequency Distribution of Heights of 11,382 School Boys 
between 4 and 15 Years of Age in Glasgow 


Class Observed Theoretical 
Limits Relative Relative Frequency 
(Inches ) Frequency (Normal Curve) 
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a 


TABLE XII 


Relative Frequency Distribution of Weights of 11,382 School Boys between 
4 and 15 Years of Age in Glasgow 


Class Observed Theoretical 
Limits Relative Relative Frequency 
Frequenc (Logarithmic Curve) 


84- 88 
89- 93 
94- 98 
99-103 
104-108 
109-113 
114-118 
119-123 


Total 


In closing, we may say that the semi-logarithmic correlation 
surface is not at all uncommon in practice, and the method de- 
veloped here for treating it should prove rather useful. In fact, 


our investigation opens up a new way for determining exponential 
and logarithmic regression curves. 


GF 
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A SIMPLE METHOD FOR CALCULATING 
MEAN SQUARE CONTINGENCY 


By 


Eumer B. Royer 
The Ohio State University 


li we wish to test for a possible relationship between two 
variables which are not quantitatively measurable, but each of 
which has two or more categories, the usual procedure is to make 
a two-way table, giving the frequencies of all the possible com- 
hinations. 

Assuming independence between the two variables, a second 
table is built, making the frequencies of each column proportional 
to the frequencies in the column of row totals. When this is done, 
each of the row frequencies is found to he proportional to the row 
of column totals. 

The deviation of the actual frequency for a compartment as 
found in Table 1 from the expected frequency as found in ‘Table 
2, is squared and this square is divided by the expected frequency. 
These quotients are summed over the entire table, giving us Chi- 
square. 

The calculation of Chi-square can be made much simpler bv 
simplifying the formula. 

The probability of the occurrence of two independent events 
is the product of their separate probabilities. Thus the probability 
of the joint occurrence of Category 3 of the first classification and 
Category d of the second classification is the probability of the 
occurrence Category 3 (which is taken to be the fraction of the 
total number of cases which fall in Category 3), times the prob- 
ability of the occurrence uf Category d. The expected frequency 
of the compartment is this product of separate probabilities, mul- 
tiplied by the total number of cases. If we let £, be the actual 
compartment frequency. € the expected compartment frequency, 
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f, the total frequency for the row, & the total frequency for the 
column, and /V the number of cases, we may write, 


(1) 


Also, 
tuw(6-¢F* 
x ‘== 
fa*- 2 ty te + fe* 


=2 


fa” 25, «Lh (2) 
"Lg eaTy* e- as 


Since the table must sum to NV, whether we have filled it with 
actual frequencies, or with expected frequencies, (2) reduces to, 


a 
x*. 2 2" NV 
te 
Substituting for 4% from (1), 
_* 


ee 


he? 
ee 


XK -v 


(3) 
£37 
Nie ee | 
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In order to illustrate our method, we shall compute Chi-square 
for Table 18 on page 86 of Fisher’s Statistical Methods for Re- 


search Workers. Our computations are presented in the following 
table. 


TY otal 


Coupling R2 ; 60 305 
F, Males 4724 3600 
22.0459 11,8033 
38 34 123 
F, Females 1444 1156 
11.7398 9.3984 
Repulsion 93 8) 418 
F, Males R49 6400 
20.6914 15.3110 
88 ; 358 


F, Females 7744 
21.6313 


Frequency 
Total 337 297 200 1204 


Product 
Total 73.7670 66.2802 73.2523 
Quotient 280450 252594 1.018133 








The actual frequency is the first entry in each compartment. 
The square is read from a table and written directly beneath the 
frequency. The reciprocal of 305 is put into the keyboard of a 
calculator and multiplied in turn by each of the squares in the 


first row. The products make the third entries in the compart- 
ments. 
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These products are summed by columns and the sums divided 
by the frequency totals of the corresponding columns. These 


2 


. . : ; a 
quotients are summed horizontally. This sum is 2 Aa and can 
re 


be substituted in (3). For our example, 


X*= 1204 (1.018133 -1.000000) 
= 21.832 


This answer agrees exactly with the answer obtained by 
Fisher in his Table 19, page 87. The advantages of this method 
are two-fold: (1) There is considerable saving of labor; (2) 
with the simplification of calculations, we have greatly reduced 
the danger of errors caused by dropping of decimal places. 


Chr Boyer 








